Covariance-Based Outlier Detection for Compositional Data with Structural Zeros: Application to Italian Survey of Household Income and Wealth Data

نویسندگان

  • Gianna S. Monti
  • Karel Hron
  • Peter Filzmoser
  • Matthias Templ
چکیده

Outlier detection is an important task for the statistical analysis of multivariate data, because often the outliers contain important information about the data structure. In compositional data, represented usually as proportions subject to a unit sum constraint, the ratios between the parts (variables) contain the essential information. This inherent property is, however, incompatible with the presence of zeros in compositions. Here we consider structural zeros, i.e., zeros that are truly observed, and not zeros related to measurement errors (rounded zeros). In order to identify possible outliers in compositional data with structural zeros, we apply the Mahalanobis distance approach, where the key task is a robust estimation of the covariance matrix. This resulting outlier detection procedure is applied to the Italian Survey of Household Income and Wealth (SHIW) data, collected by the Bank of Italy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

The Dynamics of Household Wealth Accumulation in Italy

We examine the dynamics of wealth accumulation distribution in Italy using data drawn from the Survey of Household Income and Wealth, a representative survey of the Italian population conducted by the Bank of Italy. We compare survey data with national accounts data and discuss sample representativeness, attrition, and measurement issues. We then look at wealth inequality (the cross-sectional d...

متن کامل

Multivariate outlier detection with compositional data

Multivariate outlier detection is usually based on Mahalanobis distances, by plugging in robust estimates of location and covariance. For compositional data, carrying only relative information, a special transformation needs to be consulted in order to be able to work in the appropriate geometry. The effect of the transformation is discussed in this contribution. Furthermore, different possibil...

متن کامل

Does consumption inequality track income inequality in Italy?

This paper presents stylized facts on labor supply, income, consumption, wealth, and several measures of consumption and income inequality drawn from the 1980-2006 Survey of Household Income and Wealth (SHIW)conducted by the Bank of Italy. The SHIW provides information on consumption, income and wealth, and a sizable panel component that allows econometricians to estimate sophisticated income, ...

متن کامل

Detecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes

With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013